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ABSTRACT 

Intensity mapping experiments survey the spectrum of diffuse line radiation rather than detect individual 
objects at high signal-to-noise ratio. Spectral maps of unresolved atomic and molecular line radiation contain 
three-dimensional information about the density and environments of emitting gas and efficiently probe cos¬ 
mological volumes out to high redshift. Intensity mapping survey volumes also contain all other sources of 
radiation at the frequencies of interest. Continuum foregrounds are typically ~ 10^-10^ times brighter than 
the cosmological signal. The instrumental response to bright foregrounds will produce new spectral degrees 
of freedom that are not known in advance, nor necessarily spectrally smooth. The intrinsic spectra of fore¬ 
grounds may also not be well known in advance. We describe a general class of quadratic estimators to analyze 
data from single-dish intensity mapping experiments and determine contaminated spectral modes from the data 
themselves. The key attribute of foregrounds is not that they are spectrally smooth, but instead that they have 
fewer bright spectral degrees of freedom than the cosmological signal. Spurious correlations between the sig¬ 
nal and foregrounds produce additional bias. Compensation for signal attenuation must estimate and correct 
this bias. A successful intensity mapping experiment will control instrumental systematics that spread vari¬ 
ance into new modes, and it must observe a large enough volume that contaminant modes can be determined 
independently from the signal on scales of interest. 

Subject headings: methods: data analysis - methods: statistical - (cosmology:) diffuse radiation - (cosmol¬ 
ogy:) large-scale structure of universe 


1. INTRODUCTION 

Intensity mapping is an emerging technique for cosmolog¬ 
ical observation. It uses atomic or molecular transition radia¬ 
tion to tomographically map large volumes of the universe. 
These data volumes contain a combination of information 
about the abundance, environment, and velocity of emitters. 
Tomographic line surveys offer the potential to capture the 
dyna mic universe in epoc hs that are otherwise difficult to ob¬ 
serve (iMadau et al.11199'^ and to probe a variety of galacti c 
environments in aggregate (lUzgil et al.ll2014t iLi et al.ll2015[) . 
The recovery of many modes could allow intensity surveys 
to compete with dark ene rgy constraints from standard spec¬ 
trosco pic galaxy surveys (IChang et al.ll2008t iLoeb & Wvithel 

|200l). 

Intensity mapping shares some parallels with studies of cos¬ 
mic background radiation and spectroscopic galaxy surveys. 
Like background radiation studies, intensity mapping exper¬ 
iments use sensitive receivers to map diffuse emission. In 
contrast, a survey to resolve the individual sources of emis¬ 
sion requires significantly higher sensitivity and resolution, 
both of which drive costs or reduce scope. Intensity mapping 
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only requires resolution to reach cosmologically interesting 
scales, or scales with large enough fluctuations to secure a de¬ 
tection. Additionally, an intensity mapping survey is sensitive 
to the integral of the luminosity function, taking advantage of 
all the emitted radiation that is available, not just the bright¬ 
est sources. However, the lack of source discrimination also 
makes intensity mapping survey volumes significantly more 
difficult to interpret than spectroscopic galaxy surveys. An 
intensity survey generally has not just the line emission but 
also all other sources of continuum and transition radiation 
from other redshifts. 

This paper describes a method for estimating the power 
spectrum of intensity mapping volumes, subject to bright 
foreground emission and instrumental response. We inh erit 
from the framework (iTegmarkll 19971 iTeginark et al.|[T998h of 
optimal quadratic estimators and exten d methods that can be 
applied to single-dish intensify surveys (iLiu & Tegmarkl201 ll 

iDillon et al.ll2013[l2014HSwitzer et al.ll2013h . Rather than us- 

ing a fully optimal estimator, we construct a more generic 
estimator and new methods for handling the impact of the 
instrumental beam and foreground cleaning. Treatment of 
foregrounds is our primary focus, which in a nutshell trans¬ 
lates into specifying the most effective foreground covari¬ 
ance matrix to down-weight contamination. We argue that 
the data themselves are the best source of information about 
foreground covariance, especially in light of the instrument’s 
response to bright foregrounds. 


Inte nsity mapping was or i ginally developed for 21 cm radi¬ 
ation (iHogan & R^ll979lIScott &. Reeslll990l iMadau et al.l 
[1993), but has been studied for several other lines (in 


increasing freauencv): deuterium (ISigurdson & Eurlanetto 

2006 

), •^He-F (iMcOuinn & Switzeil 20091: Bagla & Loeb 

2009 

E CO (IRighi et all 20081 ICarillill201 1 

; iLidz et al. 

2011; 

Brevsse et al.l 20141: Li et al. 

201.5h. Cii ( 

Gong et al. 

2012; 

Silva et al.l 20141: lUzgil et al. 

12014 lYue et all I201.5h 

Lya 
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( Gong et alJl2014t iPullen et ani20l4) . C and O fine-structure 
( Kusakabe & Kawasakill2012h . and X-ray lines (iHiitsi et alJ 
I2012h . 

Many experiments to search for redshifted line emission are 
planned or under way. These experiments deploy a wide range 
of technologies to observe across the frequency range of the 
lines of interest, from the present to the dark ages. We will 
focus specifically on non-interferometric, “single-dish” meth¬ 
ods that have a common set of simpler instrumental consid¬ 
erations. “Single dish” will refer to any single-aperture opti¬ 
cal path, including refractive designs. Throughout, the beam 
or point-spread functions are constant in time, axisymmetric, 
frequency dependent and may also have some off-diagonal 
Mueller mixing. While there are many parallels in interfer- 
ometers, they are beyond the scope o f this work ( see, e .g. 
iDillon et alJ([2(iTli : [Sh^^rgr^(l2()^^ et al.ldmi^ V 

Aside from the 21cm transition, most proposed or ac¬ 
tive intensity mapping experiments use a single - dish a rchi- 
tecture. CO has been sought by IPullen et al.l (l2013 h and 
propo sed by COMAP (iLi et alJl2015h . TIME (ICrites et alJ 
I 2 OII and SPHEREX ^Dore et al.l IMl) are proposed 
for Cll and Lya respectively. Within 21 cm efforts, 
single - dish instruments h a ve been used (GET , IChang et al.l 
(l201flh :lx asui et a D(l2()13h:ISwitzer et al.l(l2013h l or proposed 
(BINGO, Battve et al.l(l2012h f for studies at z 1. The meth¬ 
ods described here were developed for GBT studies. Even 
at low redshifts, 2 1cm interfero meters such as BAO BAB 
(iPober et alJ 1201 3h and CHIME (iBandura et al.l 120141) are 
needed to compete with dark energy constraints from optical 
galaxy surveys. Interferometers are the only realistic method¬ 
ology for 21 cm studies of reionization. 

Diffuse radiation from cosmologically redshifted atomic 
transitions has only been conclusively detected in cross¬ 
correlation with spectroscopic galaxy surveys. The cross¬ 
power with a density field is not biased by foregrounds, which 
instead boo st errors (as s uming that foregrounds are unrelated 
to signal). ICroft et aH (1201 5h recently detected Lya emis¬ 
sion intensity in cross-correlatio n with BOSS quasars from 
z = 2 — 3.5. iMasui et"^ (120131) detected 21 cm radiation at 
z ^ 1 in cross-correlation be tween dedicated GBT o bserva- 
tions and the WiggleZ survey (iDrinkwater et alJ 20101) and in¬ 
ferred the 21 cm contribution to the auto-power (iSwitzer et alJ 
I 2 OI 3 I). Bound s on th e auto-power at modest redshift date 
to iBebbingtonI (119861) . In the absence of a coeval spectro¬ 
scopic galaxy survey, such as at reionization, c ross-correlation 
with in tensity maps of other atomic lines (e.g. iVisbal & LoebI 
(120 lOM could secure a detection of cosmological structure, up 
to challenges of correlated foregrounds. As we will argue be¬ 
low, the principal challenge of intensity mapping experiments 
is that line radiation only makes up ~ 10“^ — 10“^ of the 
intensity of fluctuations in continuum radiation at most fre¬ 
quencies of interest. 

The optimal estimator for the power spectrum requires the 
covariance matrix of the maps, and specification of this co- 
variance is a central challenge of analysis of intensity map¬ 
ping data. While the non-Gaussianity of foregrounds could be 
distinguishable from the near-Gaussianity of the signals, we 
will not consider separation using higher-point statistics. The 
full foreground covariance of a 3D survey is an Npi^ x Wpix 
matrix for Wpix total map pixels and hence already requires 
an enormous amount of information about foregrounds and 
the instrument response to foregrounds. We have little prior 
knowledge of either. 

Common approaches to specifying foreground covari¬ 


ance amount to differe nt forms of dimensionality reduction. 
iLiu & TegmarkI (1201 Ih show that most of the covariance that 
distinguishes foregrounds is in the iz, v' directions rather than 
combinations involving angular separations. Eitting polyno¬ 
mials along the lines of sight corresponds to an ansatz of z/, v'- 
only covariance co ntributed by those polynomial modes (e.g. 
IWang et aH ([200^). These ansatzes can be better tuned to as- 
trophysical foregrounds by using models of the emission. As- 
trophysical synchrotron intensity is thought to be described 
by a li mited number of spectral modes (e.g. ILiu & TegmarkI 

(EoH). 

If foregrounds are bright and the instrument response is not 
sufficiently well understood, then misspecification of the fore¬ 
ground covariance can result in significant contamination in 
the maps that is not down-weighted. Eor foregrounds 10^ 
times the signal, an unattributed 1% error in calibration at one 
frequency could result in contamination that is 10 times larger 
than the signal. Worse still, if the spectral calibration varies in 
time, then each line of sight effectively sees a different bright 
foreground. These may even form a complete basis of bright 
spectral modes, making the signal indistinguishable. Earaday 
rotation through polarizatio n leakage into inten sity in radio 
surveys is another example (iMoore et al.ll2013l) where addi¬ 
tional spectral modes can be produced by the instrumental re¬ 
sponse. 

The position we take in this paper is that (1) some best- 
effort calibration has been applied, but that there is residual 
structure in the map related to the instrument, and (2) the 
intrinsic foreground cannot be modeled well in advance. In 
the regime of high foregrounds, measurements of foregrounds 
from other instruments or wavelengths may not have the fi¬ 
delity to be useful for foreground subtraction. Intensity map¬ 
ping surveys will often be the deepest surveys available in a 
region. This is ultimately due to the dimness of the atomic 
radiation, but another factor is that many 2D surveys stop in¬ 
tegrating beyond the confusion limit. A typical intensity map¬ 
ping experiment can benefit from thermal noise levels well 
below spatial confusion. These factors argue that the inten¬ 
sity survey volumes will be the best sources of information 
about foreground covariance rather than prior models of the 
instrument or intrinsic foreground emission. 

Even if the relevant foreground covariance is separable as 
V, v' blocks, we are unlikely to estimate that covariance ma¬ 
trix at full rank from independent sight lines in the data. The 
final dimensionality reduction we assume is that the v' co- 
variance estimated from the data will have a few dominant 
eigenvectors that are measured with high signal-to-noise ra¬ 
tio, while the remaining data are dominated by the cosmo¬ 
logical signal and thermal noise. The foreground eigenvec¬ 
tors are spectral degrees of freedom that can be projected 
out of each line of sight. Determination of contaminated 
modes in the data themselves has been e xploited in GBT dat a 
(IChang et al.| |201(i|: |Masui et al.l 120131: ISwitzer et al.l 120131) . 
GMRT (Pac^a et^l I2OI3I) . and most recently in PAPER 
(e.g. lAli et al.fi 20151) ). Blind methods have be en considered 
for SKA (IWolz et alJE^ lAlonso et al.ll2015l) a nd BINGO 
(iBigot-Sazv et al.l 120151) . ISwitzer & Liu (l2014h develop a 
similar method for monopole signals. The success of this 
method relies on (1) whether instrument response to bright 
foregrounds can be explained by fewer spectral modes than 
the cosmological signal and (2) whether these modes can be 
determined from the data themselves, independently from the 
signal. These requirements relate to the rank of foregrounds 
rather than spectral smoothness. 
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The first requirement is intuitive and says that the signal 
must have degrees of spectral variation that are orthogonal 
to the foreground modes that are removed. The second re¬ 
quirement arises from the fact that the foreground modes are 
determined from the data themselves, which have foreground, 
signal, and noise. Spurious correlations between foreground 
and signal result in residual covariance in the maps that is 
anticorrelated with the cosmological signal. This general ef¬ 
fect is familiar from the ILC bias (see, e.g. lEfstathiou et alJ 
(I2009M . Especially on scales that are large compared to the 
survey volume, there are too few realizations of signal fluc¬ 
tuations for the spurious correlation with the foregrounds to 
“average down.” In this case, it is difficult to disentangle sig¬ 
nal from foreground. 

Section 12] briefly reviews continuum foreground levels for 
several transitions of interest and de scribes data from the 
GBT-wide survey (iSwitzer et al.ll201^ that we use as an ex¬ 
ample throughout. SectionjSjdevelops the general (potentially 
suboptimal) quadratic estimator, the skeleton prescribed by 
the optimal estimator, and ways of calibrating the analysis us¬ 
ing Monte Carlo simulations. Section |4| builds up the for¬ 
malism of mode removal, line-of-sight cleaning, and rules 
of thumb for the impact of cleaning known modes. Rather 
than known modes, spectral contamination can be determined 
from the data themselves (Section|5]l, leading to modifications 
for rules of thumb of signal loss and estimation of transfer 
functions. Finally, Sections |6] and |7] describe the procedure 
for assembling final power spectral estimates using subseason 
cross-powers, weighted 2D to ID averaging, and development 
of errors. 

2 . CONTINUUM FOREGROUNDS AND DATA USED 
2.1. Review of Continuum Foreground Levels 

This section briefly reviews the literature and estimates 
magnitudes of continuum foregrounds. The goal is not to 
make a precise determination, but instead to argue that in¬ 
tensity mapping has a common set of challenges. In detail, 
foreground challenges will vary across the bands and survey 
strategies. For example, an experiment that needs wide sky 
areas may not be able to avoid bright extragalactic sources, 
galactic emission, or zodiacal light, while small areas can be 
better tuned. To get a rough understanding of irreducible con¬ 
tinuum levels, we will consider fluctuations in extragalactic 
radiation on angular scales where the signal has order-unity 
fluctuations. Here the mean line emission intensity serves as 
a proxy. Smaller angular scales may have higher signal vari¬ 
ance, but a study of signal vs. continuum emission on dif¬ 
ferent angular scales is deferred to future work for particular 
lines and redshifts. 

Considering 21cm first, iMasui et'HI (120131) show inten¬ 
sity maps at 800 MHz {z ~ 0.8) with foreground fluctu¬ 
ations of order a kelvin, while the signal fluctuations are 
~ 0.2 mK. While the 21 cm reionization signal is brighter, the 
foregrounds are commensurately brighter because of the syn¬ 
chrotron spectral index, yie lding a similar challe nge. Consid¬ 
ering C O( 1 -0) at llSGHz. lBrevsse et alJ (120141) and lLi et alJ 
(120151) suggest mean temperatures of ^ 1 /rK at z ~ 3. The 
scales of interest at tens of arcminutes are reas onably analo¬ 
gous to the Cosmic Background Imager (CBI) (iPearson et alJ 
120031) . which finds fluctuations in the raw maps at the level 
of several hundred at 30 GHz. (Note that some sources 
could be cleaned or masked based on catalogs, but cosmic mi¬ 
crowave background [CMB] would remain.) By z ^ 8, extra¬ 


galactic and galactic synchrotron become more problematic, 
while the mean brightness is expected to be a similar order of 
magnitude (iLidz et al.ll201 ll) . 

Mov ing to Cll (157.7/rm) at reionization, iSilva et all 
(120141) estimate a mean intensity of 4 x lO^Jysr”^ for 
z = 5.3 — 8.5 (300-200GHz), while the extragalactic fluc¬ 
tuations on scales of se veral arcmin in the A tacama Cosmol¬ 
ogy Telescope (A .CT~) (iD iinner et 'ani2013[) and South Pole 
Telescope (SPT) (ISchaffer et al.ll201 lb data in this band are 

- 100 /iK at 220GHz, or - 5 x lO^^Jysr-^ (CMB domi¬ 
nated). CII emission i s thought to reach a maximum at z Ri 1, 
at ~ 5 X 10^ Jysr~^ dUzgil et al.ll2014l) . while Herschel AT¬ 
LAS (lEales et alJl2O10b shows cosmic infrared background 
fluctuations ranging over ^ 2 x 10^ Jy sr~^ at 3 50 gLvci. 

For Lya (10.1 eV) manning. ICroft et (120151) measure the 
mean surface brightness in cross-correlation between quasars 
and spectra, finding vli, = (0.74 ± 0.17) nWm“^sr“^ (at 
4500A) across z = 2 — 3.5, ^ 21 — 35 hig her than pre¬ 
viously expected (see, e.g. iPullen et'di (12014b '). These red- 
shifts span the expected peak of emission from high star 
formation rates. Reported mean backgrounds in the BOSS 
spectra include all sources of radiation, including terres¬ 
trial, and are vl,y ^ 50nWm~ ^sr~^. For lower red- 
shifts, GALEX (iMurthv et al.ll2010b data suggest astrophys- 
ical backgrounds of ~ 10 nWm“^sr“^ for 5 .1 — 8.4 e'V, simi¬ 
lar to the general cosmic optical background (iHauser & DwekI 
2QQ1J). In the reioniz a tion er a where Lya has shifted to ~ ^m, 
Levenson & Wrighi (120081) And that the extragalactic mean 
contribution at 3.6 ^m is 9 nWm“^sr“^. 

In summary, typical continuum contamination to intensity 
surveys is 10^ — 10^ times the line contribution. The underly¬ 
ing challenge is the small fraction of total luminosity emitted 
through line radiation. Instrumental response needs to be con¬ 
trolled at a subpercent level, commensurate with the bright¬ 
ness of the foregrounds. 

We will conside r emission of a sing le line rather than a 
more general SED (Ide Putter et al.l20T4b and ne glect interlop¬ 
ers at other redshifts fe.g. iBrevsse et all (l201-5h '). which have 
received more attention than instrumental response to bright 
continua. 

2.2. Data Used to Demonstrate the Method 

We will use data from the GBT-wide survey to give con¬ 
text to the estimator described here. Previous publications 
review the o bservations and desc ribe the power spectrum 
of GBT data (ISw itzer et al.l 120131) and the cross -correlation 
(iMasui et al.l 120131) with the WiggleZ survey dBl^ee^^ 
1201 lb. No results from new data are reported. IMasui et^ 
(12013b describe the observations in more detail. The GBT- 
wide intensity survey used the prime-focus receiver to map a 

- 7° X 4.3° region from 700 to 900 MHz with FWHM - 0.3° 
and 256 spectral bands. 

Our starting point will be the map that has been estimated 
from time-ordered data. A framework for estimating maps is 
well established from CMB analysis and depends on partic¬ 
ulars such as noise correlations and frequency masking. The 
details of the calibration, radio frequency interference (RFI) 
mitigation, and map making used to produ ce th e maps used 
here can be found in iMasuil (120131) . Section ISTI describes the 
calibration of GBT data in regard to the spectral structure of 
contaminated modes. 

We use Ga ussian signa l realiz ations of the Empirical- 
NL model of iBlake et all (i2^ . which uses HALOFIT 
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(ISmith et alJl2003h for nonlinear power, Kaiser redshift dis¬ 
tortions, and streaming of the Lorentz form with (t„ = 
300 h km/s. To agree with Empirical-NL, we use flm = 0.27, 
S2a = 0.73, = 0 -166, h = 0.72, and Us = 0.96 

from iKomatsu et ahl (I2009h . These parameters are also used 
to translate the observed regions into comoving Cartesian co- 
ordinates. W e use approximations to the growth factor from 
iKasail (120101) . The brightness temperature of the 21 cm line is 
taken to be 


Tb{z) = To 


VtHl{z) 


10-3 

with To = 0.39 mK. 


rtjn + ^^ a (1 + z ) ^ 

- 1/2 

■1 + 2 ' 

0.29 


2.5 


1/2 

5 

( 1 ) 


3. THE QUADRATIC ESTIMATOR 
3.1. Map Notations 

The intensity survey produces maps at several frequencies. 
We can represent these maps as a matrix X with dimensions 
No X Ng, where is the number of frequency slices and 
Ng is the number of angular pixels observed. This is a stack 
of 2D maps, where the map at each frequency is unraveled 
into an Ag-long vector. An alternative is to unravel the entire 
3D volume into a single vector x of length ■ Ng. These 
representations are related through the operation vec(X) = x 
which unravels the stacked map matrix into a vector. The 
matrix form X has the useful property that operations can act 
explicitly on either the frequency or angular side as AXB, for 
A and B here, respectively. While this looks like a quadratic 
conjugation of X, it is linear in x through the relation 

vec{AXB) = (B^ O A)x, (2) 

where 0 is the Kronecker product. Another perspective is 
that if a matrix W multiplying a; can be written separably as 
Wi, 0 NVg, then those weights can act independently on the 
spectral and spatial axes of X. We will use the X representa¬ 
tion of the data when it is convenient to call out this separable 
form, and x when we want a simple “cascaded” set of opera¬ 
tions to apply to the full map. Note that 2D intensity surveys 
directly analo gous to CMB cross -correlation can also be pur¬ 
sued (see, e.g IPullen et al.l(l2013h ): however, all cases studied 
here will be 3D. 


3.2. Quadratic Estimators 

In this section we revie w the general q uadratic estima¬ 
tor, following early work (iTegmarld Il997h and recent ap¬ 
plications to intensity mapping (iLiu & Tegm M^l201 IL120121 
iDillon et'd]l2013L 120141 ISwitzer et al.ll2013l) . We break with 
tradition somewhat by deriving expressions for a general 
quadratic estimator rather than specializing to the optimal 
case. 

Take a covariance model with Gaussian thermal noise N 
(initially ignoring foregrounds) and signal that is decomposed 
as a set of amplitudes pa times the covariance C _a of modes in 
the map. Throughout, commas will denote derivatives. Then 

C = {xx"^) =N + '^PaC,a- (3) 

a 

We will develop a parti cular C „ for {k±, A:||) modes of the 
power spectra in Section U3] 

Our goal is to estimate the amplitudes pa of the covariance 
from a given map x, and infer their errors and correlations. 


A general class of covariance estimators forms a quadratic 
combination of the data, subtracts a bias ba and then takes a 
linear combination of the q\a = Qa, as 

Qa — trt QaiC b(y 

p = Tiq. (4) 

The expectation value of the quadratic combination is 

(x'^Qox) = Tr(CQ,) (5) 

and is sensitive to the variance of the noise N as well as 
the signal I3y choosing to subtract a noise bias 

ba = Tr (NQq) based on a model for N, ija measures just 
the signal covariance. Einally, the matrix R takes a linear 
combination of the band powers ija to form a final estimate 
Pa. The vector q c ontains pseudo-powers in the language of 
iHivon et'H] (120021) . While it has mainly been de scribed in a 
role of decorrelating (iHamilton & Tegmarkll200()h band pow¬ 
ers, the final linear combination of band powers by R per¬ 
forms several roles as follows: (1) a normalization to ensure 
that Qa recovers an unbiased estimate of the signal (neglect¬ 
ing beam and foreground considerations), (2) a correction 
for signal attenuation from beam convolution and foreground 
down-weighting, and (3) an operation that decorrelates band 
powers. 

In the standar d treatment of quadratic estimators (see, e.g. 
ITegmarld (IT9^ ). one seeks to minimize the variance of the 
estimator p subject to the Lagrange constraint that it is an 
unbiased estimate of true p. Most literature develops expres¬ 
sions henceforth assuming an optimal estimator. In the case of 
the optimal estimator, the Eisher matrix is ubiquitous because 
the estimator can saturate the Cramer-Rao bound for Gaus¬ 
sian fields. In contrast, we will assume that is given and 
will in general be suboptimal. Section 13.31 derives the form 
of the optimal estimator to develop some intuition for good 
suboptimal estimators. The fully optimal estimator requires a 
complete model of the covariance for its optimal weights. In 
any near-term intensity mapping applications, both the signal 
covariance and the foreground covariance should be assumed 
to be unknown. Thermal noise of the instrument can be mea¬ 
sured well and is the only prior input to the estimator. We will 
develop expressions for generic Qq, that will be tuned to be 
more robust to these unknowns. 

With the choice ba = Tr(NQa), the expectation value of 
our estimator for the covariance model of Equation[3]is 

{qa)='^PpTr{C^pCla) 

0 

{p) = R(q) = RMp = Wp, (6) 

where we have identified M|a^ = Tr{C^pQa) as a mixing 
matrix and W = RM as the bandpower window functions. 
The origin of t he mixing matrix i s familiar from quadratic 
methods such as iHivon et al.l (120021) and is due to the correla¬ 
tion of Eourier modes sampled over a finite area (or alternately 
not having an orthonormal basis in a restricted survey area). 

The estimator p is then a window W that weights several 
modes of the true, underlying signal covariance amplitudes 
p. The matrix M is fully dictated by the estimator, but R 
must be chosen. One option is to pick R so that Wq,q, = 1, 
ensuring that pa is a unit multiple of pa- This does not mean 
that (pa) = Pa because Pa will generally be a combination of 
several band powers. Another choice is to pick Ra to give a 
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weighted average of band powers as a window function 

^WU^ = 1. (7) 

/3 


This constraint does not fully specify R, but we can impose 
an additional constraint for simplicity that R is diagonal and 
R|c(a = Ra- This matrix normalizes each band power but 
does not apply any sense of d ecorrelation, which is developed 
as a final step in Section lOl In this case, the full estimator is 

Pa — Ra{p^ QctX ^a)i (8) 

and the constraint in Equation|7]fixes 


R 


a. 


^Tr(C,^Q„) 

P 


(9) 


We will refer to R as the “rote” normalization because it 
handles any arbitrary multipliers of the quadratic data com¬ 
bination. For example, if we scale all the estimators by 
7 , then Ra oc I/ 7 , correcting for the normalization. Pencil- 
beam surveys and slabs (one beam wide in a slice many beams 
long) will have significant mode coupling and may warrant a 
different type of analysis. Here we consider application to 
map r egions with essenti ally unifo rm sky coverage, suc h as 
ACT (iDiinner et aHlMTl and SPT (ISchaffer et al.fcOl ll) . 


3.3. The Optimal Estimator and Its Normalization 

So far, Qa has organized a generic quadratic combination 
of the data. In this section, we review the formally opti¬ 
mal estimator and the procedure it implies for analyzing the 
data. The optimal estimator provides a good starting point for 
constructing estimators, but subsequent statements will leave 
general rather than assume optimality. The covariance of 
the q values in Equation|4]is 


maps, reinforcing q as a pseudo-power as in iHivon et'H] 

The operation C q, is more complex in intensity mapping 
surveys because of the relation between the geometry of the 
observations and the desired 3D power spectrum, x repre¬ 
sents the calibrated intensity as a function of angular point¬ 
ing and frequency and can be interpolated onto a cube in 
units of ft,”^Mpc using a fiducial cosmology. Here angular 
slices translate approximately into spatial slices, and frequen¬ 
cies translate into distances to redshifts. The relation between 
redshift and distance is nonlinear, and the low-frequency end 
of the survey represents a larger spatial area than the high- 
frequency side. The survey is therefore approximately a trun¬ 
cated pyramid in comoving Cartesian coordinates. Any areas 
outside the pyramid can be given zero weights in the larger co¬ 
ordinate volume. If the angular region is large enough, there 
will be non-flat sky curvature in the spatial slices. Through 
these factors, there is not a 1 -to-l translation between obser¬ 
vation and comoving Cartesian coordinates, and interpolation 
inevitably leads to loss in fidelity. Consider linear interpola¬ 
tion schemes that can be represented as Pa;, where P moves a 
map X to Cartesian coordinates. It is not generally invertible, 
but so long as the discretization is fine enough, the observed 
9—v space maps can be translated to Cartesian and back again 
with little loss. 

Once in a grid of constant Cartesian dimensions, C „ can 
be understood as an operation that take s the 3D Fourier trans¬ 
form of both maps (iDillon et al.ll2013h and bins the k-vectors 
into annuli in a range of fcj_ „ to k±^a+i and fc|| ^ to A:||_a+i 
that define the range of the 2D band power pa- Let K be the 
linear Fourier operation so that x = K'PC~^x is the Fourier 
transform of the data in Cartesian coordinates. Then the bin¬ 
ning operation is mathematically 

qa = '^Ik&A^x{k)x{k)* /'^IkeA^, (14) 

k ' k 


Cov{qa,qp) = rr-(CQaCQ^). (10) 

Minimizing the covariance results in 

where the normalization is fixed by the Lagrange multiplier 
that forces the estimator to be unbiased. 

Based on the previous section, the mixing matrix of the op¬ 
timal estimator is 

M\ap = Tr{C-^C^aC-^C,fi), ( 12 ) 

which is just the F isher matrix F of the estimator (e.g. 
iTegmark et al.l (119971) 1. 

This is the starting point for more robust estimators. Note 
that Qq is only optimal when C is known exactly. When it is 
not known perfectly, the estimator is generally suboptimal but 
can be constructed to be unbiased. 

3.4. Optimal Estimation Procedure 

To get intuition for the term C a, note that in CMB analysis 
it is the outer product of spherical harmonics 


where IkeA^ is the indicator that is 1 in the /c-bin annu¬ 
lus Aa and 0 elsewhere. Figure [T] shows the number of 3D 
Fourier modes contributing to 2D band powers with logarith¬ 
mic spacing. This operation performs no weighting, assum¬ 
ing that C“^a: has noise isotropic in the k± annulus. The 
estimator has the form of an inner product x^BaX where 
Ba performs the binning of 3D Fourier cells. Combined, 
C.a = P^K^BqKP and can be understood as taking the 
Fourier transform of both maps in Cartesian coordinates and 
then binning onto the band power a. This can be easily paral¬ 
lelized across band powers by using the same Cartesian space 
conversion and Fourier transform. Note that is applied to 
the maps on both sides in observing coordinates of 0 and u. It 
will remain natural to discuss covariance in those dimensions 
rather than Cartesian coordinates because contamination nat¬ 
urally lives along i/. 

The matrix calculations throughout this paper are skeletons 
that put the proper form to procedures implemented in soft¬ 
ware. Putting together what we have so far, x^QaX can be 
written as a numerically convenient procedure: 

1. Weight both maps by their inverse covariance in observ¬ 
ing coordinates (and remove the mean map if needed). 


C,a=ya2/I where ya\i=Y£m{a){r^). (13) 

Hence, for the CMB, qa has the interpretation of the am¬ 
plitude of the spherical harmonic transform of the weighted 


2. Translate observing to Cartesian coordinates. 

3. Calculate the fast Fourier transform to both sides of the 
quadratic product. 
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Figure 1. Number of 3D Fourier modes contributing to a given band power in 
the GBT-wide survey. The survey region is 7° X 4.3° from 700 to 900 MHz 
in 256 spectral bins. Towai'd increasing and fey, the number of modes 
grows quadratically and linearly, respectively. The survey volume encom¬ 
passes several fundamental Fourier modes that appear as isolated bands. In 
subsequent plots, we ignore these lone harmonics. Even though the num¬ 
ber of available modes rapidly increases toward high k, information from 
k± > 0.4 /rMpc“^ is almost entirely suppressed by the GBT beam. 

4. Bin onto a band power. 

When considering one Fourier mode, the binning operation 
is separable (iLiu et al.ll2014^ as = D^D. In this case, 
it is useful to derive the map in observing coordinates that 
has information about a Fourier mode. Doing this requires 
transforming through P and back through P~^, which we as¬ 
sume to exist even though information could generally be lost 
in the repixelization. This “filter” for the modal information 
in a map becomes x'^ = P”^K^DaKPC“^a;. The map 
x'^ is the Fourier component a of the noise-weighted input 
map, and the complete quadratic estimator is simply x'Jx'^. 
In later sections, we will estimate rules of thumb for the im¬ 
pact of foreground operations on signal by considering these 
simple forms of inner products. 

3.5. Estimating the Normalization 

There is complete freedom in choosing R in Equation |6] 
For arbitrary R, W = RM translates theoretical expecta¬ 
tions p to the same space as the measured data p. If R is 
invertible, there is no information loss, and the choice of R 
relates to the presentation of data. For example, we argued 
that the choice of Equation|9]corrected for some multiplier 7 
in the estimator, but there is no reason it needs to. The theo¬ 
retical P{k) could be compared to the data through the same 
pipeline with the arbitrary 7 retained, effectively comparing to 
-fP{k). A non-invertible R would be a poor choice because 
information is lost, but any information that does get through 
would permit a comparison of theory and observation. 

The choice of R is especially relevant to intensity map¬ 
ping because signal is attenuated by the beam and foreground 
down-weighting. One outlook is that the theory P{k) should 
have the same beam and foreground treatment applied as in 
the real data and then be compared to the “raw” experimen¬ 
tal band powers, without making any effort to correct the ob¬ 
served band powers. This makes it difficult to intercompare 
experiments with different foreground properties and beams, 
or even the same experiment with different foreground treat¬ 
ments. We prefer to bring the measured p into the same foot¬ 
ing as the theoretical band powers p from P{k). This means 


correcti ng fo r the beam (Section iTbl l and foreground cleaning 
(Section l4.21 i. 

Normalizations take the form Tr{C^pCla) (see Equa- 
tion|9]l. The previous section described a software procedure 
to calculate x^Qx, but not the trace with an arbitrary ma¬ 
trix. We would like to estimate the normalization using the 
same well-defined pipeline that calculates x^Qx, rather than 
develop a separate matrix operation. This reuse reduces the 
complexity of software, enforces consistency through com¬ 
mon pipelines, and provides a convenient numerical imple¬ 
mentation. 

Tr{C^isQa) is the expectation value of band powers 
when pp = 1, so that we can estimate Ra in Monte Carlo by 
finding the mean qa = of white noise in the 

input Xp=i that is drawn from a covariance where pis = 1. In 
other words, R ensures that unit power in is unit power out. 
Monte Carlo estimation is efficient because there are typically 
many more map pixels than estimated band powers, so a sin¬ 
gle Monte Carlo involves significant averaging. The number 
of samples varies with estimator, but for GBT-wide, several 
hundred were typically sufficient. 

Rather than lump the normalization into a single factor 
that is estimated with simulations, we prefer to partitio n th e 
rote normalization, effects of beam convolution (Section iTbl l. 
and fore ground down-weighting as separate operations (Sec¬ 
tion |53]l. The rote normalization will ensure that the power 
spectral estimator of the spatially weighted maps properly re¬ 
covers inputs. The foreground transfer function accounts for 
attenuation of the cosmological signal by foreground down¬ 
weighting. Section |4] develops foreground treatment that is 
naturally partitioned into operations that avoid contaminated 
modes and weigh the survey based on thermal noise. Einally, 
the beam transfer function accounts for differences between 
the measured band powers and the inputs due to convolution 
by the instrumental response. We develop the beam transfer 
function first because it is simplest. 

3.6. Impact of the Instrumental Beam 

Diffraction limits the resolution of single-dish (aperture) 
and interferometer instruments (baseline). We will only con¬ 
sider the case of the single-dish instrument with an axially 
symmetric beam. Beams that are not axially symmetric have 
a non-isotopic impact on the data in fc-space. Also, typical 
surveys cover a region at several parallactic angles, so a beam 
that is not axially symmetric does not correspond to the sta¬ 
tionary convolution across the map. When the beam convolu¬ 
tion applies uniformly across the map, it is a multiplication in 
Eourier space, so let a particular Eourier mode be modulated 
as = BaC a- Throughout, the 2D band powers a com¬ 
bine the annulus in k^ and ky at constant k±_ = ^Jk^ + ky, 

so Ba already implicitly assumes an axisymmetric beam. The 
expectation value of the estimator is 

(fe) = ^ B pppTr{C^pQa,) = MBp (15) 
/3 

(p) = Rs(q) = RsMBp = WbP. (16) 

The window function W b = RsMB includes the effect of 
the beam, and Rb reflects that we will want a different nor¬ 
malization due to the impact of the beam. The convolution 
acts on the underlying band powers, which are the n mixed un- 
der M through the estimator, identically to, e.g. iHivon et’aH 
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(120021) . In a context like the optimal quadratic estimator, 
where Qq, = the estimator itself scales with 

the beam. In this case, M also scales as Ba and the rote nor¬ 
malization Equation |9] scales as so the effect cancels. 

We will neglect the impact of the beam on the estimator, as¬ 
suming fixed Qq for simplicity. 

Choose to keep the same rote normalization of the quadratic 
estimator as Equation |9] but extend the transformation to 

include a diagonal “beam transfer function” correction for the 
beam which is Rb = T^^R. The expectation value is 
then 

iP) = T^iRMBp = WbP- (17) 

The transfer beam function can be determined by enforcing 
Wb|q^ = 1. Again choose a diagonal = Tb|qq so 

that 

{T^y^Rc^B^ Tr{C,pCty = 1 (18) 

P 

where 

Rq = ^ ryc^pCtyBp /^ (i9) 

p ' p 

This is the ratio of simulations with beam convolution to those 
without, which can be estimated using the same pipeline as 
the data. Plugging in the value from the rote normalization, 
Equation|9]yields = Rq. With the choices here, the trans¬ 
fer function is the mixing-weighted beam. Figure |2] shows the 
beam transfer function estimated from simulations. The full 
estimator has become 

Pa = ByRa{x^QaX-ba). ( 20 ) 

This follows iTegmarkI (1 1 9971) rather than iHivon et al.l (|2002), 
where the data are multiplied by and then the beam 

is treated. Our model throughout is to normalize the band 
powers (including the effect of beam and foreground) before 
decorrelating. Because of the effects of spatial and spectral 
masking and foreground deweighting, the mixing matrix of an 
intensity survey is not as easily calculable as the CMB. Here 
decorrelation is a final step related to display of the data and is 
based on simulations (Section 16.3b rather than a closed-form 
calculation. 

The beam and its sidelobes generally broaden toward lower 
frequencies. As the sidelobes expand over the spatially vary¬ 
ing continuum foreground structure, they can produce spectral 
structure. This is anal ogous to the wed ge phenomenon in in¬ 
terferometers (see, e.g. iLiu et akl (120141) 1. In a single-dish set¬ 
ting with approximately uniform map coverage, we can con¬ 
volve all maps to a common resolution using the beam model. 
Figure|2]has no structure in because of this operation. 

4. CLEANING KNOWN FOREGROUND MODES 

The estimator developed so far has referred to general sig¬ 
nal and noise covariances. Thermal noise clearly belongs in 
the noise covariance N in Equation [3 Foregrounds are more 
challenging to attribute. Like thermal noise, they additively 
bias the power spectral estimates. Unlike thermal noise, they 
are not known accurately in advance, so the power spectral 
bias cannot be simply subtracted based on a model. They are 
also unavoidable astronomical signals, while thermal noise 
biases can be avoided by calculating the cross-power be¬ 
tween subseasons with uncorrelated noise (as we will do in 



10 '^ 10 ° 
[/iMpc“^] 


Figure 2. Transfer function of the GBT beam. This is derived as the ratio of 
signal simulations with beam to those without. It is analogous to in CMB 
analysis, and here it clearly acts along the spatial directions. Chromatic 
aspects of the beam are eliminated by convolving the maps to a common 
resolution, at the low-frequency end, with FWHM ~ 0.3°. 

Sectio n 16. lb. The perspective we take follows ISwitzer et alJ 
(120131) and fpillon et al.l (120141) by including any additive bi¬ 
ases from residual foregrounds in the final power spectrum. 
This assumes that any foregrounds that can be modeled are 
down-weighted or subtracted, and residuals are, by definition, 
not possible to model and subtract as a noise bias. Residual 
foregrounds are indistinguishable from signal. 

If foregrounds were drawn from a Gaussian distribution and 
known in advance, their covariance would be a complete de¬ 
scription. This covariance would enter and be down¬ 
weighted in the maps. In reality, we lack detailed knowledge 
of how a particular experiment will respond to bright fore¬ 
grounds in a particular region of the sky. Further, both ex- 
tragalactic fluctuations and the galaxy are non-Gaussian, so 
will not properly down-weight the contaminated modes. 
A more conservative choice is to admit some generally con¬ 
taminated modes that are fully projected out of the data rather 
than subjected to a more nuanced weight. These “avoided” 
modes are easy to include in the quadratic estimator. 

Take the data to be true sky signal tCgig plus some set of 
normalized modes Z multiplied by amplitudes a. The form 
of the quadratic estimator that we seek is then 

Qa — (^sig T Za) Qa(^sig ”b Zcf.) (21) 

where estimators are orthogonal to Z, or QqZ = 0. This 
choice forces to be uninfluenced by the modes in the ma¬ 
trix Z. The quadratic estimator can be derived in the usual 
way of minimizing the covariance subject to the constraint 
that the result is unbiased (a normalization). In addition, a 
Lagrange multiplier can force QqZ = 0. Appendix B4 of 
iTegmark et al.l (119981) shows that the optimal 

QQ = n^C"^c,QC-in, (22) 

n = l-Z(Z^C-iZ)-iZ^C-i. (23) 

Note that 11 is idempotent nil = H and so is a projection of 
the signal onto a space orthogonal to the modes Z. 

There are several equivalent viewpoints on foreground 
cleaning. Above, we have presented a constrained quadratic 
estimator that is orthogonal to contaminated modes. A 
Bayesian approach would jointly estimate contaminated 
modes with the signal modes in the map, but then marginal- 
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ize over contaminated modes to recover the signal. This 
marginali zation is func tionally equivalent to the orthogonal 
estimator (ISeliaklll998h . Both of these outlooks treat the con¬ 
tamination as additional “modes” to estimate, much like the 
signal. An alternative is to lump the contaminated modes 
with the noise covariance that is handled by the oper¬ 
ation. This too is connected to the above approach through 
the Woodbury identity 

C-^n= lim (C-bCT^ZZ^)-i = (C-bF)-^ (24) 

fOO 


The estimator avoids the foreground modes Z, so C rep¬ 
resents thermal noise, cosmological signal, and any residual 
foregrounds that are not explained by the modes Z. 

Again there is freedom in choosing C to yield slightly less 
optimal but more robust estimators. We will continue to ig¬ 
nore the signal covariance contribution in the inverse noise 
weights. Until a high-significance signal detection is reached, 
signal covariance will be both subdominant and imprecisely 
known. We will assume that C is purely thermal noise and is 
diagonal. translates simply into weighting pixels by the 
inverse of the variance from thermal noise, which is related to 
the integration time and effective Tsys(i^) of the survey. This 
prescription allows C to be precisely determined using sur¬ 
vey properties and c heck ed using the difference maps between 
subseasons (Section IbTl i. 

We can correct for the impact of foreground cleaning in the 
same scheme as the beam. Let the band power estimate be 

Pa, = iT^T^)-^R^{x'^Qa,x - 6,), (25) 

where is the beam transfer function correction and Ra is 
the rote normalization, as before. We can form a foreground 
cleaning transfer function using simulations that compare 
the ratio of the estimated band power a before and after fore¬ 
ground cleaning is applied to the data. More formally. 


F EpTr{C,pn^C-^C,^C-^U) 
“ EpTr{C,pC~^C^a.C-^) 


(26) 


The transfer function must be measured in 2D fc-space 
rather than ID fc shells because of the anisotropic nature of 
the cleaning. Foreground cleaning is not a real-space convo¬ 
lution like the beam, so is not fully described by a band power 
multiplier. The factors {T^T^)~^Ra give the proper nor- 
malizat ion b ut do not attempt to de-correlate the band powers. 
Section l63] describes how covariance simulations can be used 
to produce a final summary with independent errors per band 
power. 


4.1. Line-of-sight Cleaning: Known Modes 

The modes in Z could contain both spatial and spectral in¬ 
formation. Most of the foreground variance t hat distinguishes 
it fro m the signal is in the v' directions (iLiu & TegmarkI 
l2mlh . so Z can be separated into spectral modes per line of 
sight and yield a separable projection 11 = Hi, ® 1. Rather 
than identifying a handful of contaminated modes, it is conve¬ 
nient to work with a complete spectral foreground basis where 
only a handful of modes are projected out. Starting with a lim¬ 
ited set of foreground spectral modes u\, a complete basis can 
be constructed using a Gram-Schmidt process. (Superscripts 
without a numerical value are used as labels rather than pow¬ 
ers.) Assemble these complete basis vectors into columns of 
a matrix Uf that projects onto a “spectral foreground” basis. 


The matrix X representation of the map (Section ITTI) has di¬ 
mensions N^, X Nq and can be put in this basis as U^X. This 
combination is a stack of maps of the amplitudes of spectral 
modes. The line-of-sight projection operation becomes 

Ux = vec{[l - VfS{Vf C;;,Uf)-iUf C-\]X), (27) 

where we have added a diagonal selection matrix S that is 1 
for those modes that are subtracted and 0 for spectral modes 
that pass through. The inverse covariance in this case is C“J/, 
which under the assumptions here is just the i/, v' covariance 
of the thermal noise. This is a diagonal matrix derived from 
Tsys{v). For notational simplicity, we will assume that the 
thermal noise is constant across all frequencies so that = 
1 and 

na; = ?;ec((l-UfSU^)X), (28) 

using orthonormality of the modes Uf. This has the form of 
a simple projection of spectral modes. Experiments whose 
noise varies with frequency (or has masked frequencies) 
should use the -weighted form. Rather than carry around 
the full n, define the line-of-sight projection that acts on the 
left side of the map X as 

= 1 - UfSUf. (29) 

4.2. Cleaning Effectiveness and Direct Loss 

In this section, we use the variance as a simple measure of 
the impact of foreground cleaning, rather than the more com¬ 
plex bandpower. The variance is directly related to the eigen¬ 
value spectrum and modal structure of signal foregrounds and 
provides rules of thumb and intuition about the process of 
foreground cleaning. Let the v' covariance of a pure sig¬ 
nal map Xs be = A"g”^(XsXj’). Diagonalize the signal 
covariance as = UgA^Uj’, where As|ii = A| and Ug is 
made of the signal spectral eigenvectors u\. The signal vari¬ 
ance is 


i, = N^\xlx,) = A,-i(rr(XgXf)) (30) 

= rr(UgAgUn=E^fc’ (31) 

k 

where the expectation value is over signal realizations, and 
we have identified the Frobenius trace Tr(XgXj^). This is 
the simplest quadratic estimator, corresponding to Q = 1. 
Recall that the oper ation x'^ = devel¬ 

oped in Section U^ can filter all of the modal content of some 
band power a onto a new map x'^ so that x'^x'^ is the full 
quadratic estimator. 

Similar to the signal, pure foreground spectral covariance 
can also be decomposed into eigenmodes Uf as Cf = 
Ag"^(XfX^) = UfAfU^. After applying the cleaning n^, 
the trace of the foregrounds becomes 

€Ln = A,-iTr(n.XfXfn^) = ^ M. (32) 

z^cuts 

The variance of residual foregrounds in the map is the sum of 
the eigenvalues of the modes that were not projected out. 

The foreground cleaning operation does not null discrete 
signal modes because signal and foreground have a different 
basis. A good measure of this signal loss is to apply fore¬ 
ground cleaning to a signal map (n,yXs) and find the cross- 
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variance with the input signal map Xg, 

Cclean = Ng \Tr)) = + ^direct (33) 
^direct = -iV,-i(Tr(UfSUf XgXf)), (34) 

where we have identified the signal covariance from Equa¬ 
tion |3T] and defined the loss of signal as a direct result of 
cleaning as ^direct; which will be negative. Hence, the sig¬ 
nal variance in the cleaned map is the input signal variance 
minus some loss. We will show that this loss is the overlap 
of signal with the foreground modes that were projected out. 
Writing out the direct loss, 

arect = - E (35) 

j£cuts,fc 

which is the overlap of the signal modes with the foreground 
modes that are subtracted. We can develop a simple expres¬ 
sion by assuming that any foreground mode is uncorrelated 
with any random signal mode. In this case, spurious cor¬ 
relations scale as oc l/-\/Are^, where Nres,v is 

the number of independent modes of the signal over the fre¬ 
quency range. For example, at the highest k in the box, this 
approaches N,^, while for lower k there are relatively fewer 
spectral modes in the signal. If modes are removed in the 
sum over j G cuts, 

Cclean = 6 + ^direct - ( 1 - ) 6- (36) 

Detection of signal in the cleaned maps benefits as the number 
of signal degrees of freedom A^res.i/ exceeds the number of 
foreground modes removed, Nm- 

There will generally be some spurious correlation between 
foreground spectral modes and signal spectral modes. Under 
an assumption that the foregrounds are physically unrelated 
to the signal, there is no correlation to the signal, on average. 
Hence, spurious correlation is an issue of noise and signal 
loss rather than bias. (This can be violated when the sources 
of line radiation produce continuum emission, but that case is 
left for future work.) 

In practice, we expect that intensity mapping experiments 
will need to measure at least some contaminated modes from 
the data themselves. In this case, the signal influences the 
foreground modes. We next show that this produces a net anti¬ 
correlation of the cleaned foregrounds with the signal, result¬ 
ing in considerable bias on the largest scales in the map. This 
effect is fam iliar from ILC bias in CMB foreground clean¬ 
ing (see, e.g. lEfstathiou et ^ (120091) 1. and in general of blind 
methods. 

5. EMPIRICAL CLEANING 

The instrument response to bright foregrounds will not fol¬ 
low a Gaussian distribution or be fully known in advance. 
Some examples of instrument response are passband cali¬ 
bration, chromatic beam response, polarization leakage, and 
calibration stability. Experiments will make a best effort at 
calibrating these effects, but any differences will modulate 
how the experiment observes bright foregrounds. Further, we 
would like to avoid assuming that the intrinsic spectrum of 
contaminants is known in advance. 

A limited number of contaminated spectral modes can be 
estimated from the data themselves by finding the empirical 
i/, v' covariance C and its eigenvalue decomposition C = 


A"g"^XX^ = UAU^. Unlike the previous section, we do 
not take the average over signal realizations through (), as¬ 
suming instead that we only have access to one map X that is 
the sum of foregrounds, signal, and noise. Writing C in terms 
of these constituents, 

C = iV,-\Xf + X, + Xn)(Xf + X, + X„)^. (37) 

Initially neglect the noise contribution so that 

Q+f = iVg-i(XfX^ + XfXf + XsXf + X,Xf) 

= Cf + CA, (38) 

where Ca = ^-^(XfX^ -f X,Xf -f X,Xf), and s -f f 
denotes the fact that the covariance contains both signal and 
foregrounds. Thus, the eigenvalues that we estimate from C 
likely are dominated by foreground Cf , but also perturbed by 
the signal itself (and noise) through Ca- Write the updated 
eigenvalues of C as 

U,+f = Uf + A. (39) 

The data-cleaning operation is then with respect to these per¬ 
turbed eigenvectors, defining 

nr+f = 1 - (Uf + A)S(Uf + A)^. (40) 

This again projects a subset of contaminated modes from each 
line of sight. In comparison, the ideal cleaning would only 
remove foreground modes as 

= 1 - UfSU^. (41) 

There is not a general analytic theory describing the eigen¬ 
vectors of the sums of matrices. It is possible to write a rank- 
N update of the eigenvectors that gives some intuition that 
the signal “rotates” foreground modes, but that direction does 
not yield analytic expressions. The next section develops ex¬ 
pressions that are perturbative in small signal, and some an¬ 
alytic insights are possible. Ultimately, the properties of this 
cleaning need to be estimated numerically in transfer function 
simulations (Section l573l l. 

5.1. Example: GET Foreground Decomposition and 
Instrument Response 

FigureOshows the eigenvalue spectrum of foregrounds and 
noise in the GBT-wide survey. After removing 10 modes, 
the foreground fluctuations in map space are suppressed by 
~ 10^; however, the covariance remaining in the map is 
spread in a tail of modes at lower amplitude. Figure [3] also 
shows the eigenvalue spectrum of the i/, v' covar ianc e of the 
difference between maps of subseasons (Section fe.lb . which 
cancels constant astronomical signal. Before taking the map 
difference, we recalibrate each line of sight to isolate variation 
in spectral shape rather than amplitude. Beyond the brightest 
~ 10 modes, instrumental noise and time-varying spectral re¬ 
sponse become increasingly significant. Note that the eigen¬ 
value spectrum of the sum of covariances (noise, signal, and 
foreground) is generally not the sum of the spectra of their re¬ 
spective covariances. Hence, the the total eigenvalue spectra 
cannot be rigorously decomposed into the sum of signal, fore¬ 
ground, and noise parts. A well-designed experiment should 
have the majority of fo reground covariance explained by a 
few modes. Section lST^ describes general conditions in which 
signal can be recovered well. 
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Figure 3. Square root of the eigenvalue spectrum of the z/, v' covariance of 
the GBT-wide field, normalized to one for the largest foreground mode. The 
square root converts variance to rms temperature fluctuations in the map. The 
solid line shows the spectrum of the t/, v' covariance of the input maps, while 
the dashed line shows the spectrum of the v' covariance of the difference 
between maps of subseasons. The difference of subseason maps removes 
astronomical signal and foregrounds that are common across observ ation s 
and isolates any time-varying noise or instrumental systematics (Section rS.M . 
The largest 10^ of the rms can be explained by ~ 10 modes. Beyond that 
peak is a plateau of much smaller modes, which are increasingly dominated 
by noise. See Figure |4]for the first five eigenvectors. The downturn at high 
mode number reflects the impact of the finite number of spectrometer chan¬ 
nels, with some fraction cut due to RFI contamination. 

Figure |4] shows the largest five eigenvectors of the GBT- 
wide field. The largest mode is effectively the mean syn¬ 
chrotron emission across the survey. Commonly proposed 
smooth spectral functions (polynomials, power laws, fore¬ 
ground model eigenvectors) fail to explain both the small 
glitches and the overall, non-power-law structure. These 
residuals would still be hundreds of times larger than the sig¬ 
nal. It is possible to recalibrate the data such that the largest 
mode is spectrally smooth, based on the largest eigenvector. 
However, this does not improve the overall prospects for fore¬ 
ground deweighting. By the construction of the eigenvectors, 
the higher modes are specifically orthogonal to the variations 
of the first. Even if the mean synchrotron were forced to be 
smooth, it would have no impact on the remaining, indepen¬ 
dent contaminant modes. The brightest modes are very poorly 
described by standard series of orthogonal polynomials. At 
high enough order, a complete polynomial basis could explain 
any of these foregrounds; however, it would also explain and 
project out the signal. By discovering contaminated modes in 
the data themselves, we remove variance surgically-leaving 
the most remaining degrees of freedom for signal to transfer 
through. 

Most of the structure in Figure E] is due to instrumental re¬ 
sponse. This section reviews the beam and spectral calibra¬ 
tion process employed with the GBT data, emphasizing as¬ 
pects rel evant to conta minant modes. Complete details can be 
found in iMasuil (120131) . 

A broadband noise source injects power at the feed point 
and acts as a stable flux reference. We switch this noise source 
with a rapid 64 ms period that allows the measured calibration 
signal to be uncontaminated by sky signal and RFI. The data 
from a single scan (which for the wide-field data used here is 2 
minutes in length) are referenced to the mean amplitude of the 
noise calibrator signal. The noise calibrator itself is assumed 
to be perfectly stable. The noise calibration (a time trans¬ 


fer standard) is then referenced using a collection of scans of 
well-characterized, bright, unpolarized point sources such as 
3C 48, 3C 295, and 3C 147. This procedure is performed inde¬ 
pendently in each spectral channel and for the power from the 
X and Y polarized antennae. Separately calibrating X and Y 
signals mitigates leakage of polarization into the summed un¬ 
polarized intensity. Analysis of the point source data suggests 
that the primary source of polarization leakage on boresight is 
due to a difference in gain between the X and Y antennae. 

The above procedure achieves 0.5% uncertainty per band 
over 1 minute of integration across 84 total hours of integra¬ 
tion in the GBT wide field survey. The data used here assume 
that the noise calibrator is completely s table over the obser¬ 
vation. Similar systems (lBrvertonll201 ll) are known to vary at 
the ~ 1% level. Future work must characterize the stability of 
noise calibrator and the covariance of its spectrum. Variations 
in the calibrator’s spectral structure could result in a prolif¬ 
eration of contaminant degrees of freedom, while common 
mode variations can be more benign. The derived calibration 
in our GBT data varied at the 1%-level and may be partly 
attributable to the calibration source rather than receiver sta¬ 
bility. 

The GBT prime-focus beams have significant polarization 
leakage off boresight because of the off-axis design. Polariza¬ 
tion leakage between linear polarization and total intensity is 
of order 5% of the primary gain. The polarization leakage can 
cause spectral structure in the foreground (and thus additional 
degrees of freedom) in two ways: (1) Faraday rotation of po¬ 
larized synchrotron emission can vary across lines of sight 
and mix to frequency structure in the intensity spectrum and 
(2) leakage from polarization to intensity has its own spec¬ 
tral structure due to the instrument. The latter is conclusively 
observed in the third mode of Figure |4] 

Mitigation of polarization to intensity leakage is a subject 
for future work. A promising avenue is that the Mueller leak¬ 
age beams are observed to be approximately odd functions 
about the boresight axis. Smoothing the maps above the beam 
scale tends to suppress this leakage. Spatial smoothing de¬ 
creases the number of degrees of freedom that the cosmolog¬ 
ical signal can exercise. We will argue below that there is 
a significant penalty for fewer signal degrees of freedom be¬ 
cause of spurious correlation of the signal and foregrounds. 
The most promising approach may be to pursue the “deprojec- 
tion” approach used in the BICEP CMB pola rization analysis 
(iTakahashi et al.ll200M iKaufman et al.ll2014^ where contam¬ 
ination from leakage could be removed in the time domain 
data using a model of the response and polarized emission. 

5.2. Perturbative Expansion for Signal-foreground 
Correlations 

Write out the cleaned map for small signals as 

Xclean = n,"+f(X,+Xf) (42) 

= [1 - (Uf + A)S(Uf + A)^](X, + Xf) 

«nf'Xs -f nf'Xf - ASUfXf - UfSA^Xf. 

Here H^X^ and HJ'Xf are signal and foreground cleaned by 
pure foreground modes and are analogous to the direct loss 
in Equations iTSl and 1^ respectively. Combine the remaining 
terms as 

Xs = -ASUfXf - UfSA^Xf. (43) 

The combination Ilf Xf -b Xg represents residual foregrounds 
in the map after the empirical foreground cleaning is applied. 
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Figure 4. First five eigenvectors of the u, v' covariance in the GBT-wide 
field, descending in amplitude from the top. The largest mode is the mean 
synchrotron emission across the map. Despite best efforts at calibrating, there 
is visible small structure and non-power-law behavior. The second-largest 
mode is due to the chromatic beam response and can be effectively nulled by 
convolving the maps to a common resolution. The third mode has amplitudes 
that agree well spatially with polarized emission in the survey area, and the 
frequency structure corresponds approximately to the frequency dependence 
of polarization to intensity leakage. Beyond this mode, no clear instrumental 
response systematics can be discerned. Possibilities are calibration instability 
or variations in rotation measure appearing through polarization leakage. To 
reach the level of the signal, 30 modes need to be removed. The spectral 
modes are very poorly approximated by a series of smooth polynomials. 


In Ilf Xf, both Ilf and Xf are assumed to be unrelated to 
the signal, while the second term Xg is a scaled version of 
the perturbations to the foreground modes due to the signal, 
A. Considering the minus signs, these residuals are anti¬ 
correlated with the signal. We will show that this term bi¬ 
ases the band powers in a way that does not average to zero 
over many signal realiz ations, analogous to the ILC bias (e.g. 
lEfstathiou et al.l (l2009h '). 

When estimating the impact of empirical foreground clean¬ 
ing on the signal, it is insufficient to consider only the direct 
loss of the signal as Ilg^^fXs. One must also include the im¬ 
pact of spurious correlations of the signal and foregrounds. 
These terms are only manifest if we simulate Ilg^^f (Xg -F Xf), 
including the foregrounds. 

As before, to get a rule of thumb, take the cross-correlation 
between the cleaned map and pure signal, as 

eclean = iV-l(Tr(XcieanXf)) (44) 

= iV,-i(Tr(nrXgXn) + iV,-i(rr(XgXf)), 

where () is over signal realizations. We have neglected the 
piece (Tr(nf XfXj^)) because the foregrounds cleaned with 
pure foreground modes I^Xf have no expected correlation 
to the signal. Equation describes the first term as being 
the signal and some direct loss. However, now that fore¬ 
ground modes are estimated from the map itself, the final term 
Vg"^(Tr(XgXj’)) describes the spurious average correlation 
of foreground residuals and the signal. Writing these out, 

Cclean — Cs ~b ^direct “F Cspur- (45) 

We now calculate .^gpur in a perturbative limit and show that 
it significantly impacts the signal. At first order, the per¬ 
turbed vectors Ug+f are a linear combination of the pure fore¬ 
ground eigenvectors Uf times a weight related to the per¬ 
turbation. For small signals, the perturbation to the covari¬ 
ance is Ca = W^^(XfXj’ -F XgX^), related to the spuri¬ 


ous correlation of signal and foreground over a finite patch 
of the sky with Nq samples. According to first-order pertur¬ 
bation theory, the vectors 5i in the perturbation matrix A of 
Ug+f = Uf -F A are 


■5.=!: 

3 




(46) 


Recall that the foreground eigenvectors {Wf} are assumed to 
have full rank and are the columns of Uf. Through some 
algebra (Appendix lAli. 


Cspur ^0 


\Incuts 
j^cuts 


[(«D^(XfX^ + XgXf)«f 

AI-A5 


(47) 

The terms XfX^ describe spurious correlation of signal and 
foreground and fluctuate about zero across signal realizations 
in the () average. The fact that this expression is squared in¬ 
side the 0 means that there is a net anticorrelation of the fore¬ 
ground residuals with the signal itself. This anticorrelation 
depends in detai l on the particular foreground Xf in the map 
region. Section 15.31 considers this numerically for the GBT- 
wide survey, but it is convenient to have an analytic rule of 
thumb for the signal that is attenuated as a function of number 
of modes removed Nm- Under assumptions similar to Equa¬ 
tion!^ Appendix lAl finds 


^clean — (1 


N„ 




1 - 




iV, 


res,0 


6, 


(48) 


where Wj-es.e are the number of modes in the fre¬ 

quency (fc||) and angular {k±) directions, respectively. The 
general scaling 1 — Nm/N-ces,v is intuitive: if there are only 
10 spectral degrees of freedom in the signal and we need to 
remove 10 spectral modes, no signal remains. The depen¬ 
dence on the angular component is less intuitive because our 
cleaning operation acts entirely in the frequency direction. In¬ 
deed, if the foreground modes were taken as given, the signal 
loss scales as Equation!^ The dependence on the number 
of angular modes arises as a by-product of measuring 

the foreground spectral modes from the map itself. The co- 
variance C = Nq'K.'KJ' measures foreground spectral modes 
against a limited number of signal realizations. In general, if 
the same foreground mode is observed against many different 
signal realizations, the spurious correlation averages down. 

Equation |4^ is only a rule of thumb, and simulations are 
needed to effectively measure the number of resolution ele¬ 
ments N^es,v and N^es,e- These quantities are not simply 
and Ng, the number of spectral and angular pixels in the sur¬ 
vey. Instead, Nres,i, and Nres ,0 roughly relate to the number 
of modes in the signal at the k scales of interest. Signal com¬ 
ponents at the lowest k± of the survey have large wavelengths 
and few Nres ,0 modes in the survey volume. The spurious 
correlation of signal and foreground does not average down 
over many signal modes, and so most of the large-scale signal 
is lost. The impact of foreground cleaning on the signal de¬ 
pends strongly on both k± and fey, and so must be treated as a 
2D transfer function. 

From the point of view of survey design, one wants to max¬ 
imize the number of resolution elements N^es,v and N^es, 0 - 
The beam size Axes the ultimate sensitivity at high k±, and 
having high N^es ,0 translates into large surveys with many 
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beam spots. In general, intensity mapping surveys will need 
to cover larger areas than Fisher estimates from thermal noise 
suggest, owing to the fact that potentially many degrees of 
freedom in the data need to be used to estimate foregrounds. 


5.3. Estimating the Signal Loss Transfer Function 

The previous section argued that when contaminated modes 
are discovered from the data themselves, residual foregrounds 
will be anticorrelated with the signal (on average). The means 
that to assess signal attenuation due to foreground cleaning, it 
is insufficient to apply the clean ing t o signal-only simulations 
and measure loss as in Section 13.61 Instead, the foreground 
cleaning should be applied to simulations of signal plus fore¬ 
grounds. A reasonable approach is to add signal simulations 
tCsim to the measured map itself, x (which is assumed to be 
dominated by continuum foregrounds). One can then subtract 
the foreground power back from the band power, as 

fclout = [C"^ng+f(a; -f a;sim)]'^QaC~^ns+f (at -f x^im) 

-{C-^U}xfQ^C-^n}x, (49) 

where the modes of the cleaning operation Ilg^f are deter¬ 
mined from the map of real data plus simulations (x -b atgim) 
and Uj is determined from only the real data, taken to define 
the foreground modes. Equation|49]represents the band power 
estimate of the remaining signal simulation, and the transfer 
function can be estimated as the ratio of this quantity to the 
input simulation band power weighted by the thermal noise 
of the survey, or 


^alin — ^sim) X^ 


rpF / 9a I out 

^ ^ ■ I 

Q_ol I in 


(50) 


This is analogous to Equation |26] but includes spurious cor¬ 
relation of signal and foreground. Cosmological signals with 
poorly understood amplitudes (such as when a cross-power 
with a galaxy survey is not available) may need a range of 
signal simulation amplitudes to understand any sensitivity of 
the transfer function. 

The opti mal number of modes to remove is described next 
in Section 15.41 To be able to compare outcomes for differ¬ 
ent numbers of modes removed, the transfer function needs 
to be calculated for a range of scenarios. Especially for too 
few modes removed, the residual foreground variance may 
be significant. In this case, the estimate of ()a|out is noisy 
and Equation 1^ requires an average over many realizations 
to converge satisfactorily. 

Write out the cleaned map of Equation @3] in x rather than 
X notation as 


^clean — n-s_|_f(a} -f iUgini) ~ ~b njajgim “t” ^sim; (51) 

where again Xsim is the critical piece of residual foregrounds 
that are anticorrelated with simulated signal. On average, the 
term H/ X is just producing variance and contains nothing 
related to the signal, so it can be subtracted. We can then 
form Ilg+fja; -b tCgim) -UfX = UfXsim + 5sim to get just 
those pieces relevant to understanding how the signal acts un¬ 
der foreground subtraction. Signal loss can then be assessed 
in the cross-power between this cleaned signal map and the 
input signal 

9a|/oss — [ng_|_f(3; -b a?gin;i) ITjx] C Qa^sim- (52) 

We refer to this as a one-sided estimate of signal loss, because 
n is only on one side of the quadratic estimator. The numer¬ 


ator of the transfer function needs the expectation value of 
signal with foreground cleaning applied to both sides. We can 
rewrite this using a relation between trace and covariance as 

rr(c^„n^c-ic,aC-in) (53) 

= ^Cov{x^U^C-^C^aX, x^U^C-^C^ax) 

where x is normally distributed with covariance 1. Eor 
Gaussian x, the covariance can be approximated as the 
P^Sa,fi, where Pa is the mean power spectrum (in 
this case (a:^n^C”^C Q,a;)) and v{k) is the number of 
modes in the survey volume. A similar expression holds for 
the denominator of the transfer function. The numerator can 
be estimated in Monte Carlo with lower noise^^ removing 
the residual foreground variance as in Equation|52l The v{k) 
pre-factors drop out in the ratio, and the transfer function can 
then be expressed as the average of the ratios 

[T[g_|_f(tC “b ajgim) ^ Qa^sim 

^Qa^sim 

Equation [^provides a convenient procedure for estimat¬ 
ing signal attenuation due to cleaning empirically determined 
spectral modes. 

Eigure|5]shows the transfer function for the GBT-wide data 
(including the beam). At high k±, signal is lost to large 
beam size. The signal attenuation at low k± and low fey 
is due to spectral foreground cleaning. Signal attenuation 
toward low fey can be explained as “direct” loss of remov¬ 
ing functions along the line of sight t hat have overlap with 
the signal’s spectral variation (Section [4.2l) . The structure at 
fcy = 0.07 /iMpc~^ originates from additional foreground de¬ 
grees of freedom around that scale that needed to be nulled. 
The signal loss at low k± is less intuitive and is due to the 
spurious spatial correlation of signal and foreground on large 
angular scales. Each foreground spectral mode ul discovered 
in the survey is associated with a spatial mode through the sin¬ 
gular value deco mposition (SVD) X = (see 

Appendix [Aland iNitvanandal (I20l0l) l. The spectral functions 
nulled in the line of sight are associated with spatial modes 
of the signal. The largest foregrounds are spatially smooth 
and share some spurious correlation with the small number of 
spatial signals at low k±. Nulling the spectral variation asso¬ 
ciated with these spatial modes erases the large spatial scales. 
Signal loss toward low k± is an artifact of nulling spectral 
modes determined from the map itself. 

5.4. Aggressiveness of Foreground Cleaning 

Eor residual foregrounds to be negligible, (I) the fore¬ 
ground must inherently have few spectral degrees of freedom 
compared to the signal and (2) instrument response must be 
well constrained and mix bright foregrounds into a limited 
number of new modes. Some instrument responses like cal¬ 
ibration stability will inevitably produce a long tail of new 
modes where each line of sight observes a slightly different 
continuum. This produces high-rank residuals that cannot be 
easily estimated or subtracted. 

Until proven otherwise, interpretation of intensity mapping 
power spectra should account for the additive bias from resid¬ 
ual/unweighted foregrounds from misspecification or incom¬ 
plete knowledge of the foreground covariance. The goal of 
good instrument design and calibration is to push the ampli¬ 
tude of residual modes well below the thermal noise in the 










METHODS FOR SINGLE-DISH INTENSITY MAPPING 


13 


a 

Oh 



0.225 

0.200 

0.175 

0.150 

0.125 

0.100 

0.075 

0.050 

0.025 

0.000 


[/iMpc 


Figure 5. Transfer function of signal attenuation for 30 foreground modes 
removed in the GBT-wide field, including the effect of the beam. When con¬ 
taminated modes are determined from the data themselves, it is essential to 
include foregrounds in simulations of the signal attenuation. This can be 
viewed two ways: either (1) the signal perturbs the foreground modes or (2) 
spurious con'elations between signal and foreground need to be included. Es¬ 
pecially on fc-modes that approach the size of the survey region, there are 
few modes available and most of the signal is lost. To reach the lowest upper 
bound of the signal auto-power, foreground cleaning also removed much of 
the signal. The beam rolls off response at high fcx. ^nd the signal attenuation 
at low fcx and fey is due to the spectral foreground removal. 


maps, so that the biases are at the level of the power spectral 
errors. (For some goals with particular P{k) shapes like the 
baryon acoustic oscillation, requirements on the additive bias 
may be weaker.) 

Cross-correlation with a spectroscopic galaxy catalog has 
the advantage that residual foregrounds will boost errors 
rather than producing an additive bias. Galaxy densities and 
line intensity fields will not be perfectly correlated, so cross¬ 
correlation provides a lower bound on the fluctuation power in 
the line survey. Combined with the auto-power of the line sur¬ 
vey (which is an upper limit due to its additive bias), an inten¬ 
sity survey can provide an indirect infer ence on the range o f 
true fluctuation power in the atomic line (ISwitzer et al.ll201^ . 

Foreground cleaning requirements vary between auto¬ 
power surveys of intensity maps and cross-powers with side 
surveys. For too few modes removed, the cross-power errors 
are large because of the map variance from foregrounds. By 
removing more than the optimal number of modes, the erro r 
bars increase again as more signal is lost (iMasui et al.ll2013h . 
The optimal cleaning is achieved for the lowest errors on the 
cross-power, a well-defined quantity. Similarly, the additive 
bias on the auto-power of the intensity mapping survey drops 
as more modes are removed, but errors again increase. The 
upper bound on the line fluctuation intensity is the band power 
plus the error. One can choose a number of modes to remove 
that gives the lowest upper bound on the auto-power ampli¬ 
tude. Generally, the foreground cleaning requirements for the 
cross-power with a side survey are much less stringent than 
the auto-power of the intensity survey. Each Fourier mode 
of the cross-power has a different realization of the residual 
foregrounds crossed with the galaxy survey. In a survey with 
many fc-modes contributing to a band power, the cross-powers 
can average down over these correlations to produce small er¬ 
rors. In contrast, each mode of the auto-power of the intensity 
map with itself is a sample of approximately the same power, 
which does not average down. 

Figure |6] shows a high-level summary of the plausibility of 


detecting the auto-power in line intensity surveys. It is clear 
that one wants (1) a survey that has many signal degrees of 
freedom in the fc-modes of interest and (2) a shallow eigen¬ 
value spectrum of the foregrounds. These two goals can be 
contradictory. The first goal suggests coverage of wide areas 
of the sky to suppress spurious correlations between signal 
and foreground. However, in surveying large areas, an in¬ 
strument may observe a wider range of foreground spectral 
modes or be prone to instrumental effects that are harder to 
control across large areas and time. These effects generally 
boost the number of modes that need to be estimated. Possible 
examples of these effects are spatial variations in spectral in¬ 
dex, variations in bandpass calibration, or exposure to a wider 
range of rotation measures through polarization to intensity 
leakage. 

CMB B-modes have had a vigorous histor y of studies of in¬ 
strumental systematics (e.g. lHu et alJ(l2003h I. Similar studies 
should be undertaken for intensity mapping surveys. These 
would characterize the impact of calibration stability, beam 
response, or other instrumental response to foregrounds. This 
requires detailed simulations that are beyond the scope of this 
methods paper. Foreground emission and variations in spec¬ 
tral index may contribute a handful of spectral degrees of free¬ 
dom, but instrumental effects have the potential to mix these 
into many more bright degrees of freedom that are not well 
known in advance. In future studies, the eigenvalue spectrum 
of input astronomical foregrounds can be compared to the 
eigenvalue spectrum as observed by a simulated instrument. 
A well-designed survey will control the number of modes in¬ 
duced by the instrument response, targeting the rapidly falling 
green contours in Figure |6] 

6 . ASSEMBLING THE EINAL PRODUCT 

Previous sections defined the core aspects of the estima¬ 
tor and foreground cleaning. In practice, the estimator can 
be made more robust to temporally variable noise by forming 
cross-powers between subseasons. The 2D band powers de¬ 
rived above are expected to have low signal-to-noise ratio in 
the first generation of experiments. They are also are difficult 
to interpret. We describe a procedure to optimally bin onto 
ID powers and develop a covariance model and decorrelation 
of those powers for convenient display. The binning weights 
from 2D to ID provide insight into the information content of 
the intensity survey. 

6.1. The Subseason Cross-power 

Thermal noise is uncorrelated between subseasons of the 
observations to an excellent approximation. In addition, some 
forms of contamination such as time-varying REI, or calibra¬ 
tion instability (and its induced foreground residuals), may 
be largely uncorrelated across times. The cross-powers be¬ 
tween Ns split subseasons of the data xa--- Xn. therefore 
have no additive noise bias from these terms (iTristram et al.l 
I2005I ISwitzer et ^120131 iDillon et'dl 120141) . A subseason 
cross-power estimator is formally nonoptimal, but does follow 
from the form of the optimal quadratic estimator. AppendixiBl 
shows the choices that lead to the estimator 

cx (55) 

where and are the nonsignal noise covariance in two 
maps. The data are weighted by their respective covariances 
for the two subseasons. Following the form of Eauationl2^to 
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Figure 6. Intensity mapping foreground regimes. The signal power at a fixed k band power is a constant (black) normalized to 1. The eigenvalue spectrum of 
foregrounds is approximated as a power law, with a steep (green) and shallow (red) decline from 10"^ times the signal (in power, no t ma p space), as more modes 
are removed. EiTors on the band power (blue) increase as more signal is removed by foreground cleaning, according to Equation 1481 The noise level is set to 
achieve a 5cr detection of the signal at zero modes removed. Residual foregrounds are also boosted by the signal loss transfer function. Dashed red and green 
show the decline of the intrinsic foreground variance as more modes are removed. The solid lines show the residual foreground variance with transfer function 
applied to keep signal constant. A successful experiment will have foregrounds below noise and noise below signal. The left panel shows an experiment that has 
80 spatial degrees of freedom for this k mode. This experiment will only be successful if the foreground eigenvalue spectrum falls rapidly (green). Note that 
for the shallow spectrum, the contamination at its minimum is hundreds of times the signal and larger than the noise. The right panel shows an experiment that 
has 250 degrees of freedom for the k mode. In this case, there are sufficient modes to secure a detection even for the shallow foreground eigenvalue spectrum. 
This does not include the impact of noise on contaminated mode determination, which generally limits the maximum number of modes that can be removed, or 
alternately how fai' the foreground can be suppressed relative to thermal noise. 


avoid contaminated modes, 

OC (WAllAXAfC^^iWBllBXB). (56) 

This expression continues to assume that after projecting the 
contaminated modes through Haxa, the remaining covari¬ 
ance is dominated by thermal noise, which simply follows the 
coverage map of the experiment in the two subseasons and 
can be implemented as diagonal inverse covariance weights 
and W b- Take the projections to remove spectral modes 
along each line of sight as in Eauationl29l 

= uec(n^XA) = vec{{l - UajSU^ f), X^) 

(57) 

where now the contaminated modes Uyi.f can be particular to 
the subseason. (If the instrument noise varies with frequency, 
that weight should appear here as in Equation |27] We neglect 
it to keep a simpler equation.) 

We can also develop foreground modes that are best tuned 
to the respective data splits by finding modes of the cross¬ 
covariance of the two subseason maps as 

c,y = XaXI = U^,f (58) 

where the final operation is the SVD of the cross-variance. 
The covariance estimat e can be improved by using the 
weighted maps Wax a (ISwitzer et al.ll20l3l) . However, in 
doing this it is important to force to be a separable ex¬ 
pression as (g) by averaging over the frequency and 
spatial directions. Otherwise, the weighting operation would 
increase the rank (e.g. nonseparable W applied to X = 
is no longer generally rank-1). 

Most instruments have a chromatic beam. Without ac¬ 
counting for this, the modal structure will include variance 
introduced by this frequency response. In GET, this was the 
second-largest mode (Eigure |4]i and was treated by convolv¬ 
ing the maps to a common resolution. In general, any well- 
modeled aspects of the instrument should be treated in the 
map to reduce the burden of discovering those modes in the 


data. 

The estimate is the average over all cross-power pairs 


<?a = 






(59) 


We have assumed that all pairs have similar statistical power. 
If thermal noise dominates, this can be arranged by forming 
all of the subseason split maps from approximately the same 
integration times and areas. If this is not possible, then the 
cross-powers should be more optimally weighted. 


6.2. Projecting to ID Powers 


The optimal estimator structure developed in Section 13.41 
binned from 3D Eourier space to the 2D band powers as¬ 
suming that the noise was isotropic across constant = 

-f k'^ under the action of C“^. Eor the first generation 

of intensity mapping experiments, there is likely to be insuffi¬ 
cient signal-to-noise ratio on each 2D band power. Binned ID 

power P{k) for a band of .-f A:| € {k,k + Ak) contains 


most of the cosmological information other than redshift- 
space distortions and so provides a convenient summary. 

Noise is strongly anisotropic across the k± and fc|| ring that 
contributes to some k. This is because the number of modes 
increases quadratically in k± and linearly in k\^, the trans¬ 
fer functions will vary across the ring in 2D fc-space, and the 
noise itself may vary across the ring. These factors also make 
it difficult to interpret the 2D power spectrum. In going from 
2D to ID powers, we can apply an inverse-covariance weight 
to maximize the ID signal-to-noise ratio. Let a continue to 
refer to 2D band powers k±^a, fc||,a and Rk be the ring of k^ 
values that contribute to the k band power. The weighted sum 
is 


P{k) 


Ea laCiRk ^aPa 


(60) 


Eigure |7] shows the indicator la^Rk as a set of colored bands 
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for each bin. Again the choice of Wa is a weighting related 
to optimality rather than bias so long as Wa is based on the 
variance rather than the mean value of the bins. In contrast, 
an analysis should not use Wa that does additional foreground 
suppression by masking areas of the 2D power that have ap¬ 
parent residual foregrounds in the mean band power, as this 
would produce bias. 

In the language of subseason data splits, Pa^^ is an auto¬ 
power, and it describes the variance of thermal noise, cosmo¬ 
logical signal, and residual foregrounds. In practice for GBT- 
wide, this was dominated per ka bandpower cell by thermal 
noise. Assuming Gaussian errors from the dominant thermal 
noise, the inverse covariance weight per pixel is 

N 

Wa = (61) 

^^auto,Q! 

where is the number of 3D Fourier cells contributing to 
the 2D power in band power a, is the total transfer func¬ 
tion that applies to the band power a, and Pauto.a is estimated 
from the 2D auto-powers. Figure |7] shows Wa for the GBT- 
wide survey. Prior to these measurements, the theoretical ex¬ 
pectation was that foreground spectra vary slowly and that 
primarily information at low fc|| is lost. In the case of fore¬ 
ground modes determined from the data themselves, large an¬ 
gular scales have more spurious correlation with foregrounds, 
and considerable k± is also lost, resulting in deweighting of 
low k±. Interband correlations are estimated in the next sec¬ 
tion using simulations, but are not exploited in the 2D to ID 
weighting. 

6.3. Errors and Decorrelation 

A simple estimate of the band power errors can be derived 
from the variance of cross-power pairs of subsets of the data. 
In the case of a survey split into maps A, B, C, and D, the 
cross-powers Ax B, AxC, Ax D, B xC, B x D, Cx D form 
six unique samples of the instrument’s thermal noise. The 
subseason maps A, B, C and D all share the same underlying 
signal and residual foregrounds, so the variance of the crossed 
pairs does not reflect the sample variance. It is also a poor 
estimate of the errors if there are a limited number of crossed 
pairs (six in the example here). 

If the cleaned maps are dominated by Gaussian fluctuations 
such as from thermal noise, the full bandpower covariance 
can be determined from the measured auto-power spectrum 
between common sections (e.g. A x A), the power spectrum 
across sections, and the survey geometry (see, e.g. iDas et all 
(1201 Ih for an application in CMB analysis). The validity of 
Gaussian errors depends on both the signal and residual fore¬ 
grounds. If a survey resolves sufficien t non-Gaussian cosmo¬ 
logica l signal, then methods such as iHarnois-Deraps & PenI 
(12012^ should be considered to properly describe the errors. 
To date, no intensity maps have reached this high cosmolog¬ 
ical signal-to-noise regime. Appendix ICl calculates the full 
band power covariance cov(P, P) using a hybrid of Monte 
Carlo simulations for the off-diagonal structure and Gaussian 
errors for the amplitudes. 

With the optimal estimator, both the final covariance and 
the window function a re the Fisher matrix. Standa rd discus¬ 
sion of decorrelation (iHamilton & TegmarkI l200dh for opti¬ 
mal estimators freely moves between undoing the effect of 
the window function through and diagonalizing the fi¬ 
nal band power covariance through For the subop- 

timal estimators here, the variance 2Tr(CQcCQ^) is dif¬ 


ferent from the windowing matrix Tr(C ^jQq,). The band 
powers Pa so far have just been scalar normalizations times 
the pseudo-powers qa rather than a linear combination that 
decorrelates the band powers. 

With the full band power covariance model in hand 
from Appendix |0 we can repeat the classic decorrela¬ 
tion choice (IHamilton & Tegm^l2000l) of F~^/^ by taking 

^ /s _ 1 /o 

cov(P, P)(jata to multiply by the band powers (normalized 
so that the weights on the band powers sum to 1). So long as 
decorrelation multiplies by an invertible matrix, no informa¬ 
tion is lost and the choice is purely one o f display, which gen¬ 
eral ly benefits from unc orrelated errors. ISwitzer et al.l (120131) 
and iMasui et"^ (120131) use this pipeline for two GBT inten¬ 
sity mapping surveys and describe results and the interpreta¬ 
tion of ID band powers subject to additive bias. 

7. DISCUSSION 

Intensity mapping experiments have the potential to map 
cosmological volumes with resolution and sensitivity require¬ 
ments that are modest compared to direct spectroscopic sur¬ 
veys of objects. In addition to atomic or molecular line ra¬ 
diation, these surveys generally receive continuum radiation 
that can be orders of magnitude brighter. We have devel¬ 
oped a quadratic estimator that combines some aspects of both 
galaxy and CMB surveys, but also accommodates methods of 
down-weighting bright continuum emission. A fully optimal 
estimator requires a model of the covariance of contamina¬ 
tion, which we argue is not well known prior to an experi¬ 
ment. In the example of GBT data, the spectral structure of 
contaminant modes was related primarily to the instrument 
response rather than intrinsic spectra. The instrumental re¬ 
sponse is residual in the sense that considerable new effort 
was put into calibration, in addition to the heritage of a well- 
established instrument. 

We develop an estimator that (1) takes cross-powers of sub¬ 
seasons to avoid bias from temporally-variable noise (2) es¬ 
timates a foreground covariance model from the data itself 
through a reduction in dimensionality, (3) accounts for the im¬ 
pact of spurious correlations between signal and foreground, 
and (4) derives the final ID power and its errors. Trans¬ 
fer functions provide a convenient way to calibrate the esti¬ 
mator’s output to the signal input and can be estimated effi¬ 
ciently using Monte Carlo simulations. Spurious correlations 
of signal and foreground result in an average anticorrelation 
of signal and residual foregrounds. Simulations for the trans¬ 
fer function must include foregrounds to properly account for 
this effect. 

Figure |6] argues that for an intensity mapping experiment 
to be successful, it must control the eigenvalue spectrum of 
foregrounds and observe a large enough area that spurious 
correlation between signal and foreground can average down. 
While the intrinsic foregrounds may only have a handful of 
degrees of freedom, variations in instrumental response have 
the potential to mix those spectral modes into a larger number 
of new modes and a shallower eigenvalue spectrum. In par¬ 
ticular, variable spectral calibration contributes some level of 
full-rank covariance (each line of sight responds differently to 
bright emission), even with rank-1 input contamination. The 
eigenvalue spectrum of the z/, v' covariance of the maps is 
the central metric for the quality of the calibration or map¬ 
ping procedure. Because of instrumental effects, contaminant 
modes are not necessarily smooth and so generally poorly de¬ 
scribed by smooth functions. The salient aspect here is not 
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Figure 7. Left: binning from 2D to ID powers occurs in bands along constant k = .^A;^ + fcy for 15 bands between k = 0.1 /iMpc“^ and 1 /iMpc“^. 

Constant colors show the 2D fc-cells contributing to a ID band power. The bins are constant in log(fc). Right: the noise weight from Equation 161 1 for the 
GBT-wide survey. The weight bins per 2D bandpower are normalized to sum to 1 across the ID fc-bands in the left panel. The colored bands in the left panel 
guide the eye for 2D k bins that contribute. At low fc ^ 0.1 hMpc“^, the weight favors slowly varying spectral modes and more rapidly varying spatial 
modes (the vertical part of the iso-fc contour). This reflects the fact that the foreground cleaning destroys spatial modes with kj_ <0.1 hMpc“^ because 
the bright foreground modes are associated with spatially smooth structure in the map. In contrast, long-wave spectral modes are less penalized because the 
contaminated modes have high-frequency spectral structure (only the largest mean synchrotron mode is spectrally smooth). In contrast, by fc 1 )iMpc“^ 
almost all information comes from rapidly varying spectral modes (the horizontal part of the iso-fc contour) because the beam has destroyed essentially all of the 
information at k± > 0.4 hMpc“^. For intermediate scales, information at a wider range of modest k± and A:|| is weighted more heavily. The noisy stiucture of 
the weights is driven by variations in the auto-power in the denominator of Equation [ST] 



the spectral smoothness but rather that the signal can fluctuate 
in many more ways than the instrument’s response to bright 
foregrounds. 

Figure |6] also demonstrates one of the challenges of reach¬ 
ing convincing detection using intensity mapping data alone, 
when no cross-power corroboration is possible. An experi¬ 
ment only has access to the band power estimate of signal 
plus foreground, which formally represents an upper bound 
on signal. This total bandpower falls as foregrounds are more 
aggressively cleaned. If signal dominates, the bandpower will 
reach a plateau where errors increase, but the amplitude does 
not diminish as the cleaning pushes to down-weight more 
foreground structure. However, a shallower plateau could 
also result from the fact that residual foreground variance 
is boosted after accounting for the transfer function (see the 
solid red curve, left panel of Figure |6]l. The onus is to argue 
(1) that cosmological signal power is detected and is stable 
to efforts to clean additional foregrounds and (2) that residual 
foregrounds and the signal transfer function do not conspire 
to appear as a stable signal band power. Additionally, there 
may be features in the power spectrum such as the BAO fea¬ 
ture, or redshift-space distortions (in the 2D spectrum), that 


support the interpretation of cosmological signal. 

Intensity mapping shares some parallels with CMB B-mode 
searches, where instruments must be designed to prevent mix¬ 
ing between bright contaminants and the signal, and fore¬ 
ground cleaning is a central strategy. The same language and 
metrics that have been developed for beam systematics in B- 
mode searches would be fruitfully carried over to intensity 
mapping. All of the lines and redshift ranges of interest have 
differences in experimental methodology, but the eigenvalue 
spectrum provides a common reference for developing in¬ 
strumental requirements. Beyond astrophysical foregrounds, 
planning of future intensity mapping experiments should also 
include high-fidelity simulations of the instrument to deter¬ 
mine requirements for the accuracy and stability of the spec¬ 
tral calibration. 
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APPENDIX 

IMPACT OP POREGROUND-SIGNAL COUPLING IN BLIND CLEANING 

In this appendix, we find the expectation value of the correlation between the residual foregrounds in the cleaned map and the 
input signal. Section 15?^ argues that the cleaned maps contain Xg = — ASU^Xf — UfSA^Xf. The term A describes how 
pure foreground spectral modes are influenced by the signal. In this appendix, we argue that Xg is anticorrelated with the signal, 
on average. At first order in a perturbing signal, A = UfH where the matrix elements are 


HI,;, = W 


,(MD^(XfXg^ + XgX^ 


)“j 




(Al) 
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Here H = -H^, is skew-symmetric because of the denominator of the perturbation element, while the numerator is symmetric 
by construction. The correlation between residual foregrounds in Xg and signal due to spurious correlations is 

6pur = A'e-'(Tr(XgXf)) = -iV- i(Tr (UfHSUfXfXf + UfSH^UfXfXj’)) (A2) 

= -Afg-i(rr(HSU^XfXfUf -f SH'^U^XfX^Uf)) (A3) 

=-N^^Tr {nSUf (XfXj’ + XgXf )Uf)), (A4) 

where the first line uses A = UfH, the second line uses the cyclic property of the trace, and the third line uses the fact that 
a matrix and its transpose have the same trace, and the symmetry = S. Let S = (XfXj^ + XsX^)Uf and note that 
H|y = Ng^Tilij ■ (Aj — Here S is a symmetric cross-variance of the signal and foreground in the basis of the foreground 

modes. We can split all matrices in the foreground mode basis into cut modes and uncut modes. The matrix S is 1 along the 
diagonal for cut modes and zero elsewhere. Order the cuts so that they all reside in the upper left submatrix as 


S = 





(A5) 


where c and u denote modes that are cut vs. uncut, and analogously for H. The trace evaluates to 

Cspur = ^(Tr(HccScc) Tritluc'^cu))■ (A6) 

The first term is the trace of the product of skew-symmetric and symmetric matrices, so it is zero. Evaluating the second term 
using the symmetry of S produces 


^spur — 



(A7) 


taking the transpose of H and reversing the denominator to preserve sign. The elements of the spurious correlation S have zero 
mean (over signal realizations), but the correlation between signal and cleaning residuals in ^spur appears quadratically inside the 
average over signal realizations, resulting in a net bias. 


Take the SVD of Xf, Xf = so that 


Cspur — Ng 






\ incuts 
j^cuts 




(A8) 


By the construction of the filter, A[ ^ A^ for i € cuts and j ^ cuts because the cuts remove the highest variance foreground 
modes. In this limit. 



(A9) 


j^cuts 


The amplitudes of foregrounds drop out, and the matrix element is the overlap of the signal with the subtracted 

foreground spatial modes and unsubtracted spectral modes. If the subtracted modes have a smooth spatial distribution, spurious 
correlations will wipe out signal at low k±. Using the SVD of the signal Xg = \/A^AeM|(uf)^, 



j^cuts 


(AlO) 


To expand the squared sum on n, note that cross terms with n ^ n' will average to zero in the () over signal. Recall that both 
the spatial and spectral modes are normalized so that u^u = 1 and v^v = 1. To get a rule of thumb, let Nres,L> be the effective 
number of spectral degrees of freedom of the signal fluctuation. Then each inner product fv 1 /Nres,v and the spatial 

inner products-squared scale as Ri 1/Nres,e- Let the sum on z € cuts be over Nm cut modes. If there are Nres,u spectral modes 
available in the survey, then the number of uncut modes in the sum j ^ cuts is over Nre8,v — -Am- Using the sum of the signal 
eigenvectors from EguationlMl the scaling of ^spur gives the rule of thumb 


Sspur 




NmiNres,^^ - Nm) 


(All) 


Recall that f^clean — 'Cs iCdirect “1“ ^spur^ Of 


Cclean — 


1 - 


N N (N - N ) 


Nr, 



Nm \ 
N res,!-' / 


N^\ 


(A12) 
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In interpreting this rule of thumb, it is useful to think about the survey volume filtered onto particular scales for a given band 
power. The terms A^res.i/ and iVrgg.e are essentially unrelated to the number of frequency bins and spatial pixels Ng in the 
survey. Either N^, or Ng could be made arbitrarily large through mapping with finer pixels or a larger number of spectral channels. 
Instead, the relevant quantity is the number of spatial and spectral degrees of freedom that the signal in the given 2D band power 
can explore. If there can be many signal realizations on a given scale, then the spurious correlation with the foregrounds averages 
down better. 


RELATION OE THE CROSS-POWER TO THE OPTIMAL ESTIMATOR 
Split the season into two maps xbY' let the covariance be 


C = 


s + Na Sx a 
S x S + Nsj’ 


(Bl) 


where S = Sx = is the signal covariance and N^, are the noise covariance in the two maps. We assume that 

the noise covariance contains only thermal noise, which is uncorrelated between subseasons. Any signal on the sky (including 
residual foregrounds) is correlated between subseasons, and we absorb it in the signal covariance. The optimal estimator remains 

qa OC x"^C~^C^aC~^X. (B2) 

This generically involves combinations of the data like x^Qxa (auto-powers) and x'^Qxb (cross-powers). To avoid noise 
bias, we would like to avoid terms like tc^Qtc^. This is done by 1) neglecting the blocks along the diagonal of C and 2) by 
neglecting the signal covariance contribution to C. These choices are 


0 S 


_ /Na 0 
-[ 0 Nb 

S \ 


ba = Tr(C,„C) = 0. 


(B3) 


S,a 0 

Putting these factors together, the crossed estimator is 

qa OC (N;(^a;A)^S,„(N5^£CB)^. (B4) 

Eormally, this cross-power is suboptimal because it neglects signal correlations in the weighting, and it neglects signal infor¬ 
mation in the auto-power. 


GAUSSIAN ERRORS 

This derivation follows iDas et al.l (1201 Ih except that we do not form explicit map differences to estimate thermal noise. The 


covariance of an estimator Pixj across Gaussian fields i, j is 

1 


COv(Pjxj, Efexi) — /, N \{Pixk){Pj-Kl) P {Pixl){Pkxj)] j 

1X[K) 


(Cl) 


where v{k) is the effective number of modes that enter the average for the band power. Model the power spectra as {Pixj) = 
Pauto for i = j and {Pixj) = Px for i ^ j. Eor simplicity, we will assume that each map section has approximately the 
same statistical properties so that the cross-powers are represented by Px (e.g. A x B) and the auto-powers are represented by 
Pauto (e.g. A X A) to a good approximation. However, in surveys where map sections have different integration times or noise 
properties, these expressions should be expanded to break out the noise properties of the different subsurveys. Note that Pauto 
includes both thermal noise a nd s ky variance, and Px includes any sky variance (including residual foregrounds). 

The covariance in Equatior(CT]of several data combinations is 


no sec. in common 


one sec. m common 


!y(fc) 


2 p2 
X 

[P^x 


PX Pauto 


two sec. in common ^ [P^ -p 


(C2) 

(C3) 

(C4) 


An example of the first case would be AB, CD, the second case would be AB, AC and the third case, AB, AB. The total 
covariance of the estimated power spectrum is the sum of the covariance cases above with appropriate multiplicities 


cov(P(fc),P(fc)) = 


1 


Ns{Ns - 1) L 

1 

ix{k)Ns{Ns - 1 ) 


{Ns - 2) {Ns - 3) 




PxPai 


[2{N^ - 3Ns + 3)P2 + A{Ns - 2)Px Pauto + 2P1 


We can cast this in a more familiar form by letting Px = Psig be the “signal” and Pa, 
Separating the signal and noise powers from Pauto one has 


iy{k)cov{P{k),P{k)) = 2P- 


2 

sig ■ 


, PsigPn 
Ns 


+ 2 


p2 


Ns{Ns-l)' 


v{k) ^ 

] . (C5) 

Psig -f Pn “signal plus noise.” 

(C6) 
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These Gaussian errors determine the diagonal of the bandpower covariance and require an estimate of the number of modes 
i/{k). In practice, restrictions of the survey volume due to spatiaPspectral weighting and masking produce a complex covariance 
structure and effective number of modes. We advocate a hybrid approach where the bandpower covariance diagonal is estimated 
through Gaussian errors using the auto- and cross-powers of the data (and so represents sample variance and thermal noise). This 
is then used to calibrate a full covariance matrix determined by Monte Carlo of the complete pipeline. 

Let Csim = cov(P(fc)sim, P{k')sim) be the measured covarianc e of signal plus noise simulations of the data pipeline. Put the 
square root of Gaussian errors derived for the simulation (Eauation lC5b along the diagonal of Agim and likewise for the measured 
data Adata- Then, recalibrate the band power covariance measured in simulations against the variance measured in the data as 


Cdata — AdataAj CsimA- Adata- 


(C7) 


In the operation A^^^ Adata, the common factor i/(fc) drops out of the Gaussian errors of the data and simulations and does not 
need to be calculated explicitly. The simulation pipeline does not need a high-fidelity model of the real data’s covariance. Instead, 
the goal of the simulations is to measure the off-diagonal terms, and the Ax A combinations of the real data give Gaussian errors 
that calibrate the amplitudes. 
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